feat(eval): seed hand-graded golden cases so the benchmark can run (GEPA) by Victor-David-Medina · Pull Request #660 · Relay-Launch/relaylaunch-console

Victor "David" Medina (Victor-David-Medina) · 2026-07-02T04:18:00Z

GEPA benchmark: seed the hand-graded golden cases so it can actually run

Check-before-build first. The scoped task was "build the GEPA golden-set (recovery-draft + morning-brief)." Reading the eval system first showed that would be 100% duplication - the golden set already exists:

lib/eval/golden-cases-recovery-v1.ts - 56 graded cases across all 4 revenue workflows (lapsed-winback, estimate-recovery, review-lift, slot-rescue), each with an ideal reference_verdict + graded dimensions.
lib/eval/golden-cases-v1.ts - 50 graded cases including morning-brief, churn-prediction, isa-routing, content, council.

The real gap (a genuine "non-working" section)

Those 106 cases are imported by nothing except the lib/eval barrel. The eval/PGR benchmark reads golden cases from the evaluation_golden_cases Supabase table (golden-dataset.ts), and scripts/run-first-eval.ts bails immediately:

if (!cases || cases.length === 0) { console.log("No active golden cases found. Seed cases first."); process.exit(0); }

There is no seed that bridges the static constants into that table. The graded content was orphaned from the runtime path - the benchmark could not run at all.

The fix (wire it live, don't rebuild)

lib/eval/golden-seed.ts - pure source: getSeedGoldenCases() (combines both sets) + toGoldenCaseInsert() (maps a static GoldenCase to a DB row; drops the client string id since the table auto-generates a UUID, drops timestamps). No IO, unit-testable.
scripts/seed-golden-cases.ts - idempotent seed: upsert-by-title (safe to re-run, never deletes), --dry-run preview, founder-gated on prod (writes to Supabase).
__tests__/golden-seed.test.ts - pure (no DB): combined set complete + unique titles (the dedup key) + covers the graded workflows incl morning-brief + maps cleanly to the insert row + every case is actually graded (not schema-only).

After merge: npx tsx scripts/seed-golden-cases.ts → npx tsx scripts/run-first-eval.ts → the "we grade ourselves" PGR benchmark is live. Honest at proof_events=0: golden cases are representative fixtures grading draft quality, not recovered dollars.

Generated with Claude Code by RelayLaunch

…n (GEPA) Check-before-build found the golden set ALREADY EXISTS (106 hand-graded cases: GOLDEN_CASES_V1 + GOLDEN_CASES_RECOVERY_V1, covering the 4 revenue workflows + morning-brief/churn/isa), so building new golden cases would be pure duplication. The REAL gap: those cases live only as static TS constants, imported by nothing but the barrel. The eval/PGR benchmark reads from the evaluation_golden_cases Supabase table, and run-first-eval.ts bails with 'No active golden cases found. Seed cases first.' The graded content was orphaned from the runtime path - the benchmark could not run. This bridges them: lib/eval/golden-seed.ts (pure source + DB-row mapper) + scripts/seed-golden-cases.ts (idempotent upsert-by-title, --dry-run, founder-gated on prod). Now seed once -> run-first-eval establishes PGR baselines -> the 'we grade ourselves' benchmark is live. Honest at proof=0: golden cases are representative fixtures grading draft QUALITY, not recovered dollars. Tests (pure, no DB): combined set complete + unique titles (the dedup key) + covers the graded workflows + maps cleanly to the insert row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-07-02T04:26:38Z

🛡️ Cascade Quality Score: 100/100

Category	Score	Status
TypeScript	20/20	✅
ESLint	20/20	✅
Brand Compliance	15/15	✅
Test Suite	25/25	✅
Build	20/20	✅

Threshold: 85/100 | Result: PASS ✅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eval): seed hand-graded golden cases so the benchmark can run (GEPA)#660

feat(eval): seed hand-graded golden cases so the benchmark can run (GEPA)#660
Victor "David" Medina (Victor-David-Medina) wants to merge 1 commit into
mainfrom
claude/gepa-golden-seed

Victor "David" Medina (Victor-David-Medina) commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Victor "David" Medina (Victor-David-Medina) commented Jul 2, 2026

GEPA benchmark: seed the hand-graded golden cases so it can actually run

The real gap (a genuine "non-working" section)

The fix (wire it live, don't rebuild)

Uh oh!

github-actions Bot commented Jul 2, 2026

🛡️ Cascade Quality Score: 100/100

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant